-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: preserve the order of the keys #211
Conversation
a6a5a8f
to
c87a078
Compare
@zookzook do you prefer to discuss here or back at #211? I'll provide here some initial feedback related to the present implementation. First off, thanks for taking the time to implement this extra functionality ❤️ SetupHere's how I set this up on a Livebook: Mix.install(
[
{:csv, "~> 3.0"},
{:exqlite, "~> 0.13.15"},
{:kino_db, "~> 0.2.2"},
{:kino_explorer, "~> 0.1.10"},
{:kino,
github: "livebook-dev/kino", ref: "06625859ef367c50169a280ae5fe33cf9a49a8f3", override: true},
{:mongodb_driver, github: "zookzook/elixir-mongodb-driver", ref: "preserve-order"},
{:tzdata, "~> 1.1"}
],
config: [
elixir: [time_zone_database: Tzdata.TimeZoneDatabase],
mongodb_driver: [decoder: BSON.PreserverOrderDecoder]
]
) Query with explicit projectionAnd an example query: Mongo.find(mongo, "Message", %{},
sort: %{_id: -1},
limit: 10,
projection: [_id: 0, from: "$user.name", at: "$created_at", message: "$data"])
# |> Enum.to_list()
|> Enum.map(fn doc ->
doc
|> Map.get(:order)
|> Enum.map(fn key -> {String.to_atom(key), Map.get(doc, key)} end)
end) Since I pass a I know generating atoms dynamically has its perils, but in this case it seems under control and does improve readability of the output: Query without a projectionNext query, without a projection: Mongo.find(mongo, "Message", %{},
sort: %{_id: -1},
limit: 10)
# |> Enum.to_list()
|> Enum.map(fn doc ->
doc
|> Map.get(:order)
|> Enum.map(fn key -> {String.to_atom(key), Map.get(doc, key)} end)
end) The query above represents better what I was originally looking for: exploring a collection and getting the fields in the order that MongoDB has them stored. This query reviewed an obvious shortcoming of my code to transform the result: it should be applied recursively to handle subdocuments. This got me thinking maybe I could write a custom decoder that handles recursion while parsing the response from Mongo, and such that I don't need to post-process query results. @zookzook what do you think? |
Keep in mind, that the result of the decoder is used by the driver. If you would change the result by converting the binaries into atoms the whole driver will not work anymore. Try to use the collection or write just a little recursive function that converts the data into the desired format. |
I'm not sure I fully understand the implications. Could you give me a code example (perhaps a link to library source on GitHub) where this is happening?
I read https://hexdocs.pm/mongodb_driver/1.1.0/Mongo.Collection.html, and understood that to use I'll try the recursive function. |
Check this module: |
Is it that From the Then looking at elixir-mongodb-driver/lib/mongo.ex Lines 1504 to 1510 in 9e4d569
I could guess that (...) I did find https://github.com/zookzook/elixir-mongodb-driver/blob/preserve-order/lib/mongo/messages.ex, so the "messages from Mongo server" are treated the same as the documents within a query cursor, is that right? So if the result of running a command would not be represented as a map, then the library would error out trying to access map keys or doing pattern matching. |
In the meantime, I wrote the recursive function: defmodule Mongo.Document do
def to_list(doc, order_key \\ :order) when is_map(doc) and not is_struct(doc) do
doc
|> Map.pop(order_key, [])
|> then(fn {order, doc} ->
doc
|> Map.to_list()
|> Enum.sort_by(fn {key, _} -> Enum.find_index(order, &(&1 == key)) end)
end)
|> Enum.map(fn
{key, value} when is_map(value) and not is_struct(value) -> {key, to_list(value, order_key)}
{key, value} -> {key, value}
end)
end
end That could help me (or someone else in the future) go from the proposed map + :order key to an ordered collection: Mongo.find(mongo, "Message", %{},
sort: %{_id: -1},
limit: 10
)
|> Enum.map(&Mongo.Document.to_list/1) |
No, there are a lot of pattern matches: with {:ok, %{"cursorsAlive" => [], "cursorsNotFound" => [], "cursorsUnknown" => [], "ok" => ok}} when ok == 1 <- Mongo.exec_command_session(session, cmd, stream.opts) do
:ok
end
Yes, the response is like a wrapper document that contains the "real" documents.
Yes, that's right! |
What about: def to_list(doc, order_key \\ :order) do
doc
|> Map.get(order_key, [])
|> Enum.map(fn key -> {key, Map.get(doc, key)} end)
|> Enum.map(fn {key, value} -> {key, eval(value, order_key)} end)
end
defp eval(%{__struct__: _other} = elem, _order_key) do
elem
end
defp eval(doc, order_key) when is_map(doc) do
to_list(doc, order_key)
end
defp eval(xs, order_key) when is_list(xs) do
Enum.map(xs, fn elem -> eval(elem, order_key) end)
end
defp eval(elem, _order_key) do
elem
end the find-index-function may result in a bottleneck. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Michael, thanks again for this PR ❤️
Thanks to you I understand a bit more of the underlying implementation of the library and can achieve what I wanted for exploring Mongo collections 💯
I wanted to leave a few suggestions for the documentation if this ends up in a future release.
samples/decoder.exs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing this as an experiment to compare benchmarks with old vs new decoder 😄
How is this file meant to be used? Should it be in the existing ./examples
directory instead of a new one?
The doc
is represented as a map, so encoding it doesn't preserve any well-defined order. Same for the documents within items
-- IIUC the fields :id
, :age
and :name
could end up serialized in any order and the order could even be different for different items.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's right. I need to remove it:
Original encoder 3 9.54 K 104.79 μs ±6.53% 103.79 μs 130.19 μs
Original encoder 2 9.53 K 104.93 μs ±6.48% 104.13 μs 129.92 μs
Original encoder 9.50 K 105.24 μs ±7.51% 103.63 μs 138.92 μs
Original encoder 1 9.26 K 107.97 μs ±19.33% 105.75 μs 150.14 μs
Original encoder 3 15.00 K 66.66 μs ±7.15% 66.13 μs 76.91 μs
Original encoder 2 14.99 K 66.72 μs ±8.32% 66.21 μs 76.29 μs
Original encoder 1 14.97 K 66.82 μs ±7.40% 66.25 μs 76.78 μs
Original encoder 14.96 K 66.85 μs ±7.80% 66.33 μs 76.67 μs
The new encode is faster than the old one.
README.md
Outdated
end | ||
|
||
defmodule BSON.PreserverOrderDecoder do | ||
@moduledoc false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you be willing to post the relevant docs to this module in Hexdocs.pm? I'd volunteer to help write it, since getting docs either on hexdocs or IDE/language server would help discover this feature and decide whether one needs it or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, of course.
README.md
Outdated
decoder: MyPreserverOrderDecoder | ||
|
||
``` | ||
The decode module is defined at compiler time. The driver provides two types of decoder: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The decode module is defined at compiler time. The driver provides two types of decoder: | |
The decoder module is defined at compile time. The driver provides two types of decoder: |
Nice! I didn't consider performance. For my current needs it doesn't really matter. I'm running one-off queries on Livebook :) Your version reads much better than mine. Neither of us wrote tests, though 😊! I did manual-test with a few queries -- and while handling the ObjectId struct case I realized that depending on configuration, the I think your |
Co-authored-by: Rodolfo Carvalho <[email protected]>
Co-authored-by: Rodolfo Carvalho <[email protected]>
Co-authored-by: Rodolfo Carvalho <[email protected]>
Co-authored-by: Rodolfo Carvalho <[email protected]>
Co-authored-by: Rodolfo Carvalho <[email protected]>
Co-authored-by: Rodolfo Carvalho <[email protected]>
Perhaps we should rename the default name |
Hey, I still have this in mind, just bandwidth has been limited :) Mapping out our next steps here:
By the way, today I learned about |
I've solved the merge conflict in a local branch and I'm writing docs. Wanted an opinion regarding configuring an alternative key instead of I thought exposing What do you think of
config :mongodb_driver,
decoder: BSON.OrderPreservingDecoder
config :mongodb_driver,
decoder: {BSON.OrderPreservingDecoder, key: :original_order} I also briefly considered not introducing |
Ah, worth contrasting the above suggestion with what we have currently:
(and on top of creating a module, one would still need to update the config too!) |
(I was reading a blog post looking for inspiration...) A disadvantage of setting the decoder globally is that we cannot choose to open two connections with different behaviors, they are forced to use a single decoder implementation. In tests we can still test each decoder module independently, that's okay -- I wrote some basic tests for each of the implementations we have, with and without order. @zookzook what if we make the decoder configurable per connection? Apparently it would require wiring things together in |
* fix: handle read preferences without tags * Fixed minor bug parsing url connection. Updated dependencies libraries. (#209) * Fixed minor bug parsing url connection. Updated dependencies libraries. * Incremented max_nesting option in credo refactor nestring * Changed function ReadPreference.primary() by ReadPreference.merge_defaults() in topology test file --------- Co-authored-by: Juan Antonio Jiménez <[email protected]> Co-authored-by: Michael Maier <[email protected]> * fix: applies the global timeout value to each query (#215) * Document BSON.Decoder and BSON.PreserveOrderDecoder But keep the documentation for those modules private for now. Expand the documentation on Data Representation and how to deal with preserving key order when encoding and decoding data. * Test BSON decoders * Document how to go from annotated maps with order keys to lists --------- Co-authored-by: zookzook <[email protected]> Co-authored-by: Juan Antonio <[email protected]> Co-authored-by: Juan Antonio Jiménez <[email protected]>
No description provided.